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ABSTRACT 

A case study of the Florida Teacher Certification 
Examination (FTCE) program was described to assist others launching 
the development of large scale item banks. FTCE has four subtests: 
Mathematics, Reading, Writing, and Professional Education. Rasch 
calibrated- item banks have been developed for all subtests except 
Writing. The methods used to evaluate the stability of the score 
scale as it related to a cut score based upon a logit value are 
explained. Cut off levels were established in terms of a logit 
ability scale based on field test data. Linking items were chosen so 
that difficulty levels were centerecJ at the cut score. Item selection 
was based upon closeness of fit to the Rasch Model and item 
difficulty. The linking items for the Professional Education subtest 
were of primary concern. Results showed no bias in the test due to 
differences in curriculum. (DWH) 
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The Use of Precalibrated Item Bank to Establish and 
Maintain Cutoff Scores: A Case Study of the Florida 
Teacher Certification Examination 

Applications of Rasch analysis to test development are 
attracting considerable interest. Since relatively few measure- 
^^ment specialists have experience in the actual development and 
maintenance of precalibrated item banks, the case study of the 
Florida Teacher Certification Examination program may be helpful 
to others launching large scale item banks. The measurement prob- 
lems of field test design, item independence., scale score stability 
and setting cutting scores are all confronted in this case study. 

Description of the Item Bank 
— ♦ ■ ■ 

The Florida Teacher Certification Examination (FTCE) has four 
subtests: Mathematics, Reading, Writing and Professional Education. 
Rasch calibrated item banks have been developed for all but the 
Writing subtest. The original item banks were calibrated using 
a field tdst design in which seven test forms' were administered. 
Each form included items grouped into the three subtests. Three 
sets of linking items were woven into each form. Test forms 
were equated in an anchor design by adding the linking constant 
to the unadjusted item logit difficulties. 

Experimental items have been added to the bank on two occasions 
after field testing in the regular April and August administrations. 
The practical problems of cost efficiency and adequacy o"^ measure- 
ment were addressed by using a common set of scored items across 



foms with different sets of ^experimental items in each form. 
Each form was calibrated separately, and the linking constant 
was derived by averaging the difference in difficulty estimates 
of the scored samples, the adjusted values of the item logits 
for the .scored items became the base for linking the experimental 
items. This procedure was a variation of the method described 
by Ryan in Equating New Test Forms to an Existing Test . By 
creating an average logit value for each i tei]|^across forms 
and then computing the average difference between this set 
of items and their base value in the item bank, an adjustment 
factor is established to link the| experimental items in each 
form to the item bank. 

A computer based history bank has been developed to trace 
the stability of the item logits across administrations. The 
bank includes the field test logit as well as the unadjusted 
and adjusted logits from each administrations. Fit statistics 
are also included for each item. The bank is used to update 
another item file which contains the content codes and other <^ 
attributes associated with each item. The attribute bank 
is organized by the content codes to allow the test developer 
to monitor the number and difficulty of items within each 
content area. 

The techniques and problems described in this paper are 
familiar to those measurement specialists charged with the 
development and maintenance of large scale precal ibrated item 



banks. Many of these procedures have been presented in papers 
by Ryan, Mead, Ingebo and others. The contribution of this 
paper is the extension of these methods to evaluate tne 
stability of the score scale particularly as it relates to a 
cut scpre based upon a logit value. 

Setting Cut Off Levels 

The issue of the stability of the scale is related directly 
to the fact that all passing scores were set in terms of the logi 
ability scale based on the field test data. Using the method 
described in a companion paper, passing scores were set at 1.4 
logits for reading, l.Ologits for mathematics and .25 logit 
for professional education. Since difficulty levels increase 
with the value of the logit ability scale, the reading items 
require the 'highest level of proficiency and the professional 
education items have the lowest level. Normally, the higher 
the passing score, the greater the number of items that a 
candidate must get correct. In a calibrated item bank, the 
score standard need not be tied to a specific number of correct 
responses. Raw score equivalence to a cutoff may varj as 
a function- of the difficulty of the items when the cutoff is 
based upon an ability logit value. 

Linking Design 

Linking items were selected based upon the closeness of 
fit to the Rasch Model and the item difficulty. In order to 
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enhance the reliability of the cutoff, linking items were ^. 
chosen so that the difficulties, were centered at the cut score. 
The number of linking items was assigned to be proportional 
to the number of items in each content area within subtests. 
This design necessitated that the total number of linking 
items was large. Approximately twenty percent of the items 
were designated az links. 

-J 

Establishing the Scale , 

A conservative approach to test development was taken for 
the Teacher Certification Examination. Attempts were made to 
stabilize the difficulty of each examination by selecting items 
which have an average logit value of zero. In this way, each 
.'^test administration should have forms that are approximately 
equal in difficulty. Small variations in the difficiil ty of 
individual examinations are corrected through the linking 
procedures to the base scale derived from the field test. 
However, the flexibility in adjusting the number of items 
in a given subtest or alVowing the raw score equivalence to 
the cutoff vary with the difficulty of the items is a 
decided advantage particularly when the item banks are being 
built and the number of items in given content areas may be 
smal 1 . 

The problem of reducing the number of items within sub- 
tests did occur after the November 1981 administration. It 
was apparent that the testing time could not be extended in 



order to include experimental items and a separate administration 
of experimental items was too cumbersome and costly. The 
decision was made to reduce the. Professional Education subtest 
from one hundred items to eighty items. The Reading subtest 
was reduced by one passage and ten items. 'This change was 
made without altering the score scale. The ability scale 
derived from eighty or one hundred items was the same. 

Another scaling problem occurred when the decision was 
made to alter the method of calibration for the Cloze reading 
test. This subtest was initially calibrated by passage, i.e. . 
one passage score was obtained for the set of ten items 
associated withi. each passage. It was later decided to recali- 
brate on the item level which necessitated an adjustment to 
correct the scale for the method of calibration.. The correction 
factor for the recal ibration was calculated by finding" the 
difference in the unadjusted ability logits at the cut score 
for the November calibrations by passage and by item. This 
difference was added to the adjustment of the November passage 
scale to the field test scale as shown below. This additional 
step was necessary because the original field test data was 
not available for recal ibration. 



Cutoff 



Field test 
passage calibration 



1.42 



Unadjusted Unadjusted November 

Nov. passage calibration item calibration 

1 .57 



1.18 
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In order .to bring the Novembet* item calibration scale to 
the passage scale, a factor of -.390 was required (1.57 - 1.18). 
- The adjustment of the November passage scale to the field test 
was .243 (1.42 - 1-.18). The manipulation of the scales was 
completed by combining the two .factors of -.390 and .243 to 
create the final adjustment of -.147. 

Reliability of the Cutoff 

Since the passing score determined eligibility fot teacher 
certification, it was essential that the examination was reliafbT 
at the cutoff as .well as internally consistent. Reliability 
was* assessed at the cutoff by computing the standard error of 
each subtest at the passing score and by estimating the 
reliability of the 'cut scores-fusing the Brennan Kane index. 
These estimates ranged between .92 and .96 for each subtest. 
Since the index is sensitive to the average difficulty of 
the items, rel-iability of the cutoff increases as the average 
difficulty decreases. 

Maintaining the Score Scale 

The .steps outlined by Ron Mead in his paper entitled Basic 
Ideas in Item Banking have been followed to investigate the 
within form fit, the within link fit and the between link fit. 
These analyses were conducted to monitor the stability of the 
item calibrations and the score scale. Specific questions 
were raised about the stability of the items during the 



8 



/ 

J 



development of the item bank and the early administrations^ 
of the examination which guided the analysis. These questions 
)■ concerned the viability of the field test results and the 
possible effects on the calibrations of different candidate, 
populations. 

Any field test analysis- has the possibility that the 
can^ates writing the examination have not taken the items 
seriously. The item calibrations may yield difficulty 
estimates that are too high simply because of a cava-lter ■ ^ 
atti-tude by some students." When cutoff scores are based 
upon field test data, the possibility exists that they may 
be too low in spite of the attempt to set the standards 
on the basis of the skills rather than on the pass rate. 
No standard can be set. without the consideration of its - 
possible effect. Therefore, the standards set for the 
Certification examination have been monitored *o determine 
whether or not the ability reflected in the field test was 
representativi of the ability scores derived from each • 
subsequent administration. To date it appears that there is 
a slightly higher percentage of candidates than might have 
been expected that exceeds the cutoff based on field test 

results. ' 

The stability of the item calibrations was also questioned 
due to possible differences in the candidates' knowledge of the 
content. Even though the Rasch modeWis sample free, it is not 
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impervious to tha effect of differences, in curricul uifi. The 
specifications the examination were based upon the curriculum 
in teacher education programs within the state. However, substantia 
numbers of candidates from other states come to Florida to teacFi 
and must sit the examination. The reading and mathematics skills 
could be considered to be> basic and therefore common to all 
students. The skills in professional education may vary in 
emphasis depending upon the curriculum of an institution. Even 
within the state there was considerable debate about the validity 
of the examination for vocational teachers. It was possible 
that -the calibrations could cf^nge significantly during early 
administrations when the proportion of^out of state and vocational . 
educational candidates was large. 

In order to investigate this possibility, linking item 
difficulties were plotted across administrations to ensure that 
they were parallel. Comparisons of adjusted item logit values 
were made, and substantia.! changes in item calibrations were 
examined, Standard item analyses were conducted for several 
categories of examinees to- evaluate the possibility of 
differentia^ success rates on items that could be tied to 
differences in curriculum. 

Monitoring the Linking Items 

While all linking items were monitored in each subtest, the 
links for the Professional Education subtest were of primary 
concern. The possible effect of differences in the educational 
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training^of the candidates would most likely be shown in an- analysis 
of thii sub.test. '.These specific, questions guided the inquiry: 

1. Were any shifts in values due to ■ 
ambiguity in the items ?5, . 

2. Was there a systematic shift ift difficulty from^ 

the field test? . *- * 

3. Was there a pattern of'shffts in logit values 
that coiild be related to curriculum differences? 

The Professional Education subtest contains twenty eight linking 
items distributed proportionately across twenty pne competencies 
within six content areas: 

Instructional Objectives- 

Evaluating, Recording, and-Repbrting Student Progress 

Classroom Management ' ^ 

Learning and Teaching, ^ - . ' 

* 

Development of Students > 
Instructional Materials 

A difference of more than one half logit from the filld test- 
was set as a significant change in' value. Thg*plot (See Figure 1) 
shows that fourteen items had changed in value'. Eight items 
increased in difficulty and six. decreased. However, five of these 
items were skewed in only one of their four administrations. Four 
of these five skewed logits appeared in either the f^eld test or - 
the first administration. Outliers from these administrations 
were expected and kre relatively few in number. 



•There appeared to/ be no relationship between the content ^nd 
the shift in logit values. The, fourteen items represented ten 
^fferent competencies within all six content areas. The fluctuation 
of the logits was, random within the curriculum. Since similar 
results were found in the Mathematics and Reading subtests, 
the likelihood of any pattern being found was small. 

The item analyses revealed that out of state candidates 
generally- performed j^etter than in-state students on all subtests.* 
The changes in thei values of the' 1 inking items were unrelated 
'-^to the proportion of out of state ^tudents in each administration. 

Candidates for vocational -techh^'cal certification did less 

well on all subtests than in-state or out of state groups.*^ 

However, they performed better on professional education 

items than on basic skills. Thus, their results' w§re most 

likely due to a generally lower Jevel of academic skills. 
* "* 
All of .the- evidence points to the conclusion that there * 

is no bias in the test due to-^ffferences in curriculum. Each 

conteht area has similar success rate^ for each of tire population- 

categories discussed. The proportion of iteins'.necessary to pass 

the examination remains relatively constant. It varies only ^ 

with small differences in test difficulty. Difficulty is not 

related to differences in content or curriculum. 



1 

Figure 2, page 12. 
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•Figure 1. ^1°''^ Logits: Professional Education 1 = Field 

2 = November 

Item 3 = April 

4 = August 

0196 ■ 4 123 • 



0192 
0185 
0179 
0155 



* 0149 



0147' 
0146 • 
0145 
0135 
0128 
0113 
0096 
0092 
0086 
0045 



'0001 
0368 
0352 
0351 
0350 
0344 
0318 
0290 
0268 
0252 
0207 



1 243 

2 13 4 
21 34 

1 4 32 
423 1 

1 4 23 
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\ 13 42 



2 ■ 34 1 >^ 
34 1 

2 1 34 
4 2 Ij 
2 1 43 

1 23 4 

1 432 
1423 
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0036 ^ 



3 2 1 



2 4 13 
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1 24 3 

- 1 4 2 3 

43 2 1 

4 23 1 

12 34 

2 1 ' 34 

1 2 4 3' 

1 2 43 
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Figure 2. Percent of Items Correct by 

Content Bassv Categories 



(1) (2) (3) (4) (5) (6) 

Management Development Measurement Materials Objectives Learning 

November 73 75 73 " 77 66 73 

April 76 -73 80' 77 80 77 
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